AITopics | rl problem

Collaborating Authors

rl problem

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Exploration in Structured Reinforcement Learning

Jungseul Ok, Alexandre Proutiere, Damianos Tranos

Neural Information Processing SystemsFeb-14-2026, 19:22:48 GMT

Hence, with largestate and action spaces, it is essential to identify and exploit any possible structure existing in the system dynamics and reward function so as to minimize exploration phases and in turn reduce regret to reasonable values. Modern RL algorithms actually implicitly impose some structural properties either in the model parameters (transition probabilities and reward function, see e.g.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

Europe > Sweden > Stockholm > Stockholm (0.05)
North America > United States > Illinois (0.04)
North America > Canada > Quebec > Montreal (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

d01bda31bbcd780774ff15b534e03c40-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 01:03:20 GMT

Reinforcement learning (RL) algorithms often require a large number of data samples to learn a control policy. As a result, training them directly on the real-world systems is expensive and potentially dangerous.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Texas > Brazos County > College Station (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

d7e4cdde82a894b8f633e6d61a01ef15-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 09:55:58 GMT

algorithm, cost player, mdp, (14 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Reward is Enough for Convex MDPs

Neural Information Processing SystemsFeb-11-2026, 09:55:54 GMT

Maximising a cumulative reward function that is Markov and stationary, i.e., defined over state-action pairs and independent of time, is sufficient to capture many

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

e894d787e2fd6c133af47140aa156f00-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-10-2026, 22:16:37 GMT

algorithm, assumption, bisimulation, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.49)

Add feedback

ContinuousDeepQ-LearninginOptimalControl Problems: NormalizedAdvantageFunctionsAnalysis

Neural Information Processing SystemsFeb-10-2026, 18:29:40 GMT

One of the most effectivecontinuous deep reinforcement learning algorithms is normalized advantage functions (NAF).

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country:

Europe > Russia (0.14)
Asia > Russia > Ural Federal District > Sverdlovsk Oblast > Yekaterinburg (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms

Neural Information Processing SystemsDec-24-2025, 06:51:16 GMT

Finding the minimal structural assumptions that empower sample-efficient learning is one of the most important research directions in Reinforcement Learning (RL). This paper advances our understanding of this fundamental question by introducing a new complexity measure--Bellman Eluder (BE) dimension. We show that the family of RL problems of low BE dimension is remarkably rich, which subsumes a vast majority of existing tractable RL problems including but not limited to tabular MDPs, linear MDPs, reactive POMDPs, low Bellman rank problems as well as low Eluder dimension problems. This paper further designs a new optimization-based algorithm-- GOLF, and reanalyzes a hypothesis elimination-based algorithm--OLIVE (proposed in Jiang et al. (2017)). We prove that both algorithms learn the near-optimal policies of low BE dimension problems in a number of samples that is polynomial in all relevant parameters, but independent of the size of state-action space. Our regret and sample complexity results match or improve the best existing results for several well-known subclasses of low BE dimension problems.

bellman eluder dimension, new rich class, sample-efficient algorithm, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.78)

Add feedback

Exploration in Structured Reinforcement Learning

Jungseul Ok, Alexandre Proutiere, Damianos Tranos

Neural Information Processing SystemsNov-20-2025, 20:24:33 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
North America > United States > Illinois (0.04)
North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.84)

Add feedback

Policy Transfer for Continuous-Time Reinforcement Learning: A (Rough) Differential Equation Approach

Guo, Xin, Lyu, Zijiu

arXiv.org Artificial IntelligenceNov-12-2025

This paper studies policy transfer, one of the well-known transfer learning techniques adopted in large language models, for two classes of continuous-time reinforcement learning problems. In the first class of continuous-time linear-quadratic systems with Shannon's entropy regularization (a.k.a. LQRs), we fully exploit the Gaussian structure of their optimal policy and the stability of their associated Riccati equations. In the second class where the system has possibly non-linear and bounded dynamics, the key technical component is the stability of diffusion SDEs which is established by invoking the rough path theory. Our work provides the first theoretical proof of policy transfer for continuous-time RL: an optimal policy learned for one RL problem can be used to initialize the search for a near-optimal policy in a closely related RL problem, while maintaining the convergence rate of the original algorithm. To illustrate the benefit of policy transfer for RL, we propose a novel policy learning algorithm for continuous-time LQRs, which achieves global linear convergence and local super-linear convergence. As a byproduct of our analysis, we derive the stability of a concrete class of continuous-time score-based diffusion models via their connection with LQRs.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2510.15165

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry: Education (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.61)

Add feedback

Filters

Collaborating Authors

rl problem

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Exploration in Structured Reinforcement Learning

d01bda31bbcd780774ff15b534e03c40-Supplemental-Conference.pdf

d7e4cdde82a894b8f633e6d61a01ef15-Supplemental.pdf

Reward is Enough for Convex MDPs

e894d787e2fd6c133af47140aa156f00-AuthorFeedback.pdf

ContinuousDeepQ-LearninginOptimalControl Problems: NormalizedAdvantageFunctionsAnalysis

6f5e4e86a87220e5d361ad82f1ebc335-Paper.pdf

Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms

Exploration in Structured Reinforcement Learning

Policy Transfer for Continuous-Time Reinforcement Learning: A (Rough) Differential Equation Approach